[% setvar title docs/pdds/pdd08_keys.pod - Indexing Aggregate PMCs %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='NAME'></a><h1>NAME</h1>
<p>docs/pdds/pdd08_keys.pod - Indexing Aggregate PMCs</p>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>This PDD aims to clear up the confusion regarding the implementation of
keyed access to PMCs in Parrot.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>First, let's define some terminology. An <b>aggregate PMC</b> is one which
allows indexed access to sub-element that it stores or references, by
supporting and implementing the <code>_keyed</code> variants of vtable methods.
These variants are <b>indexing</b> operations, as they act on a specific
indexed element of an aggregate PMC.</p>
<p>Non-aggregates also support <code>_keyed</code> variants of the vtable methods,
but they may not do anything particularly clever - for instance,
PMC types implementing Perl references will merely pass the index on to
the referent. These aren't aggregates because they don't directly store
or reference sub-elements.</p>
<p>Indexing operations take one or more aggregate-<b>key</b> atoms. At runtime,
these operations will index the key into the aggregate returning a
<b>value</b>. Here is a well-known indexing operation in Perl 6:</p>
<pre>    @a[12] = $b;</pre>
<p>The key here is the constant integer <code>12</code>, and the aggregate is the
<code>PerlArray</code> <code>@a</code>. In the process of this assignment, Parrot will have
to extract the PMC in element 12 of the array, producing a value. <code>$b</code>
is then assigned to this value.</p>
<p>Now, how does this all get implemented?</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<a name='The key structure'></a><h2>The key structure</h2>
<p>The key structure has to have the following features: it must bundle up
multiple keys so that multidimensional aggregates can be indexed; keys
may be specified as integer, string, number or PMC.</p>
<p>Hence, the following structure was produced. First, the individual keys
as we think of them from a language level are each stored in a Key PMC
with the key type encoded in the private flags of the PMC and the value
stored in the PMCs cache.</p>
<p>So, for instance, indexing the multidimensional array
<code>@foo[$a;12;&quot;hi&quot;]</code> produces three PMCs, one with a PMC type,
one with an integer type and one with a string type.</p>
<p>As stated, the key type is encoded in the PMC flags using 5 bits based
on the following scheme:</p>
<pre>    typedef enum {
        KEY_integer_FLAG = PMC_private0_FLAG,
        KEY_number_FLAG = PMC_private1_FLAG,
        KEY_string_FLAG = PMC_private2_FLAG,
        KEY_pmc_FLAG = PMC_private3_FLAG,
        KEY_register_FLAG = PMC_private4_FLAG,

        KEY_type_FLAGS = KEY_integer_FLAG |
                         KEY_number_FLAG |
                         KEY_string_FLAG |
                         KEY_pmc_FLAG |
                         KEY_register_FLAG
    } KEY_flags;</pre>
<p>The <code>KEY_register_FLAG</code> is used to indicate that the <code>int_val</code> component
of the PMC's cache contains the number of a register of the appropriate type
that itself contains the value of the key component.</p>
<p>The next issue is to combine these things into a single key. This is done
by using the <code>data</code> member of the PMC to form a linked list so that each
key points to the next key. A linked list is used so that partial keys can
be easily generated as a multidimensional data structure is traversed.</p>
<p>These definitions, along with declarations of support routines used to
manipulate keys, can be found in <b><i>include/parrot/key.h</i></b></p>
<a name='Aggregate and non-aggregate PMCs'></a><h2>Aggregate and non-aggregate PMCs</h2>
<p>We've already said that what separates the aggregate PMCs from the
non-aggregates is their implementation of the <code>_keyed</code> vtable methods.
So it is Hereby Decreed that the default vtable which everyone inherits
from defines the <code>_keyed</code> forms to throw an exception.</p>
<ul>
<li><a name='Todo'></a>Todo</li>
<p>Need discussion on whether <code>OUT_OF_BOUNDS</code> is a good exception for
this, or whether something else should be used. It's really a compiler
screw-up, since code which indexes a non-aggregate shouldn't be
generated.</p>
</ul>
<a name='_keyed vtable methods'></a><h2><code>_keyed</code> vtable methods</h2>
<p>So what of these magical <code>_keyed</code> vtable methods? They are generated
when you add the <code>keyed</code> tag to the appropriate entry in <b><i>vtable.tbl</i></b>.
They are constructed by following <b>every</b> <code>PMC</code> argument with a second
<code>PMC</code> argument which acts as the key for that argument; the name of the
second <code>PMC</code> argument is formed by adding <code>_key</code> onto the end of the
first <code>PMC</code> argument.</p>
<p>The reason why every PMC argument has an associated key is twofold.
Firstly, it means that</p>
<pre>    @a[$b] = $c</pre>
<p>and</p>
<pre>    $a = @b[$c]</pre>
<p>use the same vtable method, reducing the multiplicity of methods.
Secondly, a three-argument <code>assign</code> as suggested by the code above
would be ambiguous - the code above uses 3 PMCs in different ways.</p>
<p>Also, operations which take an aggregate key for one of their
arguments should take aggregate keys for <b>all</b> of their arguments.
This is to avoid the following:</p>
<pre>    void foo_keyed_i(PMC* x, PMC* x_key, INT a)
    void foo_keyed_n(PMC* x, PMC* x_key, NUM a)
    void foo_keyed_p(PMC* x, PMC* x_key, PMC a)
    void foo_keyed_p_keyed(PMC* x, PMC* x_key, PMC* a, PMC* a_key)</pre>
<p>These are all replaced with the single entry</p>
<pre>    void foo_keyed(PMC* x, PMC* a_key, PMC* a, PMC* a_key)</pre>
<p>(Think how much worse it gets when there are three or more PMCs in an
entry...)</p>
<p>Yes. This means that you may need to turn some things into <code>PMC</code>s that
you didn't want to. Since the alternative is mega pollution and
duplication in the vtable table, and since the majority of things that
you'll deal with in a real world situation are expected to be <code>PMC</code>s
anyway, this shouldn't be too much of a problem.</p>
<p>So, if you have a PMC in a <code>_keyed</code> method which you don't want to
index, pass in <code>NULL</code> instead of a real key. Code implementing these
methods should understand <code>PMC* foo, PMC* NULL</code> as meaning the
entirety of <code>foo</code> in some sense; this is trivial to understand if
<code>foo</code> is non-aggregate, and implementation-defined if <code>foo</code> is
aggregate. If you remember that a key PMC is really a linked list,
you'll notice that after traversing down through the list, you'll
reach a <code>NULL</code> which again means the entirety of whatever object you
traversed to.</p>
<p>Similarly, non-<code>_keyed</code> methods on aggregates are implementation
defined; for instance, a <code>set_integer</code> on a <code>PerlArray</code> may be
understood as setting <code>@array.length</code>.</p>
<p>Historically, we first implemented keys as two separate keyed methods
per applicable method - <code>..._index</code> and <code>..._index_s</code> for integer and
string indexing respectively. However, this didn't give us the
flexibility and scalability that key structures give us.</p>
<a name='Input to the assembler'></a><h2>Input to the assembler</h2>
<p>There are several different valid specifications of an aggregate key
to the assembler. These are:</p>
<pre>    op arg, P1[1234]  # Constant integer key
    op arg, P1[I1]    # Integer key

    op arg, P1[12.34] # Constant number key - handled as constant key
    op arg, P1[&quot;foo&quot;] # Constant string key - handled as constant key
    op arg, P1[I1;I2] # Multi-level key - handled as constant key

    op arg, P1[P1]    # Register key</pre>
<p>(Rationale: fits programmer's expectation, easier to understand at a
glance than <code>op P1, P2, P3</code>. Also, is <code>op P1, P2, P3</code> the same as
<code>op P1[P2], P3</code> or <code>op P1, P2[P3]</code>, or are these three separate PMCs?)</p>
<p>In all there are four types of key. The first two are integer keys and
constant integer keys which are optimisations for the common case of
single level integer keys.</p>
<p>The other two are constant keys, which can handle any combination of
constants and registers with any number of levels; and register keys
which are represented by a single PMC register that is assumed to
contain a PMC of the Key class.</p>
<a name='What the assembler did next'></a><h2>What the assembler did next</h2>
<p>When the assembler sees an aggregate key, it &quot;detaches&quot; the key to
form a separate argument. It then decides on the type of key. For
integer keys (both constant and register) the data is encoded in the
same way as an ordinary integer argument. For register keys the data
is encoded as for an ordinary PMC register argument, while for constant
keys a key constant is generated that encodes the list of constants and
registers that make up the key and an appropriate index into the constant
table is encoded as the argument.</p>
<p>Next it selects the appropriate op. Register keys have the signature
<code>k</code> and constant keys have the signature <code>kc</code>. Integer register and
constant keys are encoded as <code>ki</code> and <code>kic</code> respectively.</p>
<pre>    set P1[&quot;hi&quot;], 1234</pre>
<p>finds an op named <code>set_p_kc_i</code>. On the other hand,</p>
<pre>    set P1[P1], 1234</pre>
<p>produces an op named <code>set_p_k_i</code>. Likewise, this:</p>
<pre>    set P1[1], 1234</pre>
<p>produces an op named <code>set_p_kic</code>, and this:</p>
<pre>    set P1[I1], 1234</pre>
<p>produces an op named <code>set_p_ki</code>.</p>
<a name='Bytecode representation'></a><h2>Bytecode representation</h2>
<p>The bytecode representation of these keys are as follows: constant keys
are treated just like another constant, and are an index into the packfile's
constant table.</p>
<p>Each key in that constant table consists of one word specifying its
length in terms of number of keys. For instance, <code>[&quot;hi&quot;]</code> has
length 1; <code>[&quot;hi&quot;;P1;S1;123]</code> has length 4. Next, each key is
specified using two words. The first word is a type specifier:</p>
<pre>    1 - Integer constant
    2 - Number constant
    4 - String contant
    7 - Integer register
    8 - Number register
    9 - PMC register
   10 - String register</pre>
<p>and the second word is either a value (for integer constants), a register
number (for registers) or an index into the appropriate constant table.</p>
<p>The type values shown above are actually the <code>PARROT_ARG_*</code> values taken
from <b><i>include/parrot/op.h</i></b>.</p>
<a name='VERSION'></a><h1>VERSION</h1>
<a name='CURRENT'></a><h2>CURRENT</h2>
<pre>   Maintainer: Simon Cozens &lt;<a href='mailto:simon@netthink.co.uk'>simon@netthink.co.uk</a>&gt;
   Class: Internals
   PDD Number: 8
   Version: 1.3
   Status: Developing
   Last Modified: 25 August, 2002
   PDD Format: 1
   Language: English</pre>
<a name='HISTORY'></a><h2>HISTORY</h2>
<ul>
<li><a name='Sun Aug 25 11:14:43 GMT 2002 : Version 1.3'></a>Sun Aug 25 11:14:43 GMT 2002 : Version 1.3</li>
<p>Updated to reflect Dan's decision to change keys to use PMCs instead
of a custom data structure. Also corrects documentation of multi-level
keys and how they are compiled and work. <a href='mailto:tom@compton.nu.'>tom@compton.nu.</a></p>
<li><a name='Thu Apr 25 18:30:36 UTC 2002 : Version 1.2'></a>Thu Apr 25 18:30:36 UTC 2002 : Version 1.2</li>
<p>Renamed <code>KEY_PAIR</code> to <code>KEY_ATOM</code>, updated to reflect changeover to
linked list. - <a href='mailto:steve@fink.com'>steve@fink.com</a></p>
<li><a name='Fri Mar 8 18:47:34 GMT 2002 : Version 1.1'></a>Fri Mar  8 18:47:34 GMT 2002 : Version 1.1</li>
<p>updated to reflect Dan's comments that non-aggregates also support
<code>_keyed</code> variant vtable methods.</p>
</ul>
</div>
